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ABSTRACT 

This paper discusses the implementation of an 
adaptive acoustic echo canceler for a hands-free cel- 
lular phone operating on a fading channel. The adap- 
tive lattice structure, which is particularly known for 
faster convergence relative to the conventional 
tapped-delay-line (TDL) structure, is used in the ini- 
tialization stage. After convergence, the lattice coeffi- 
cients are converted into the coefficients for the TDL 
structure which can accommodate a larger number of 
taps in real-time operation due to of its computational 
simplicity. The conversion method of the TDL coef- 
ficients from the lattice coefficients is derived and the 
DSP56001 assembly code for the lattice and TDL 
structures is included, as well as simulation results 
and the schematic diagram for the hardware imple- 
mentation. 

1.0 Introduction 

Adaptive signal processing for echo cancellation 
structures has a variety of usages in telecommunica- 
tion applications due to multi-path and impedance 
mismatches in communication channels. Echo cancel- 
lation is required especially for full-duplex voice 
transmission where the microphones and speakers are 
located in places such that an acoustic echo is created. 
One such application is a hands-free cellular phone 
which allows full duplex operation by preventing the 
phone from breaking into oscillations [1]. The ability 
to provide hands-free operation of cellular (mobile) 
phones offers users a safer and more convenient way 
to use their cellular phones while driving a car as 
shown in Figure 1 . 

In the cellular phone application, there needs to be 
two echo cancellers in the system, one to cancel the 
phone line (electrical) echo and the other to cancel the 
acoustic echo, which is the signal from the loudspeak- 
er echoed back into the microphone. In this paper only 
the acoustic cancellation problem is considered. Fig- 


Microphone 
attached to 



Figure 1 Depiction of a proposed hands-free 
cellular phone system 

ure 2 shows the model and the hardware schematic of 
the acoustic echo canceller. The adaptive algorithm 
shown as the adaptive digital filter (ADF) block in 
Figure 2 minimizes the error signal which is the dif- 
ference between the actual transmitted signal and the 
estimated transmitted signal by the linear combina- 
tion of the received data set. When the error terms are 
minimized the adaptive filter impulse response is said 
to have converged to the impulse response of the echo 
paths. 



Figure 2 Block diagram of acoustic echo canceller 

An implementation of the acoustic echo canceller 
for a speakerphone application to improve perfor- 
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mance was introduced by one of the authors [2], The 
previous paper uses the DSP56200 cascadable adap- 
tive FIR filter peripheral which implements the con- 
ventional TDL structure with the least-mean-square 
(LMS) algorithm for adaptation [3]. An acoustic echo 
canceller needs an initialization period (training) be- 
fore the phone can work properly for full-duplex op- 
eration. This may require up to 5 seconds of 
initialization time depending on the convergence 
property of the adaptive algorithm. Although the TDL 
structure [3,4] is a simple and commonly used sto- 
chastic approximation-type algorithm, the conver- 
gence time is slow, especially if the training signal is 
narrowband or has band-limited spectral content. 
Thus, pseudo-random noise that has a broad frequen- 
cy spectrum is normally used for initialization. How- 
ever, this type of random noise creates problems to 
the user who will be able to hear it during the initial- 
ization. The user will most likely interpret this noise 
to be a bad (or no) connection and will hang up before 
initialization is complete. In adaptive filtering appli- 
cations such as hands-free cellular phone, therefore, 
very rapid convergence of the adaptive coefficients is 
a requirement. 

The lattice structure based on the adaptive LMS 
algorithm has been widely accepted for applications 
where rapid transient adaptation is required and/or the 
eigenvalues of the input signal are highly disparate 
[5,6]. The lattice structure can be interpreted to be 
self-orthogonalizing which has been shown to speed 
up convergence. Cellular phones are normally used 
inside a car which has smaller acoustic reverberation 
and echo paths compared to the size of a office or con- 
ference room in which a conventional speakerphone 
must function. Thus, a fewer number of taps, which 
represents the time-window for the adaptive approxi- 
mation, than the conventional TDL structure can be 
used for the adaptive process. However, due to the 
computational complexity of the lattice-LMS algo- 
rithm it is hard to apply a large number of coefficients 
(stages) to accommodate 0.1 second (which is a time 
window of 800 taps) of acoustic echo delay in real- 
time. Thus, the lattice structure is used only for the 
initialization stage and the coefficients are converted 
to the TDL structure. The TDL structure is computa- 
tionally efficient and can accommodate larger number 
of coefficients in real-time to cancel the long delayed 
echoes. 

2.0 Acoustic-Echo Canceller Model 

Depending on the characteristics of the car’s inter- 
nal acoustics, the echo may be sufficiently strong, 
such that this echo must be removed at the micro- 
phone input. The term used to describe the amount of 


echo which can be removed by the echo canceller is 
Echo Return Loss Enhancement (ERLE) and can be 
defined as [2]: 


ERLE(dfl) = 10 log 


10 


2n 


E [y (k) 
E[e(*) 2 ]J 


( 1 ) 


where E[y(A) 2 ] and E[e(£) 2 ] are the expected 
values of microphone input signal power and uncan- 
celed echo signal power, respectively, as shown in 
Figure 2. The desired maximum amount A goal for 
ERLE is 30 dB due to ambient noise which is not cre- 
ated by the echo itself [3]. 

Due to advances in CMOS process technology, in- 
expensive adaptive digital filters are readily available. 
The DSP56001 can run upto 830 taps of a TDL-LMS 
adaptive filter at 8 kHz samples per second with 24- 
bit data and coefficients. As is shown in the following 
section, ERLE is a function of many parameters in- 
cluding the number of taps and the precision of the co- 
efficients. 

3.0 Echo Cancellation Algorithms 

In this section, two adaptive algorithms are de- 
scribed with particular emphasis on echo cancellation 
applications. 

3.1 Adaptive TDL-LMS algorithm 

Figure 3 shows a block diagram of the adaptive 
echo canceler model which uses a TDL structure to 
provide adaptive coefficient adjustment. If h i (k) are 

the filter coefficients, R xx (k) is the auto-correlation 
matrix of the received line signal x(k) at time k, and 
R zv ( k ) is the cross-correlation vector between the re- 

ceived signal x(k) and the echo signal y(k), then the 
optimum filter coefficient vector that minimizes the 

expected value of e 2 ( k ) in Figure 3 is given by [3] 

H (k) = R~l(k)R xy (k) (2) 

where H (k) is an N-element vector consisting of the 
filter coefficients at time k as 


H{k) 


h Q (k) h x (k) 


T 

V- 1 M 


(3) 


and T denotes matrix transpose. The coefficients 
(Ar) are updated to minimize the error signal (resid- 
ual echo), e(k), which is the transmitting line signal 
from the echo canceler. e(k) can be expressed as 

e(k) = y(k) -H T (k)X(k) (4) 

where X(k) is the input data vector given by 
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Figure 3 The tapped-delay-line (TDL) structure 


X T (k) = [ X (k) x(k-l) x(k-N+ 1)] (5) 

The LMS algorithm, which is one implementation 
of the steepest decent method, updates the weight 
vector, H(k), at each k via the following relation 

H(k) = H(k- 1) +\ie(k)X(k) (6) 

where p. denotes the loop-gain factor (convergence 
parameter). The adaptive algorithm forces the error 
term toward zero. When the error terms are mini- 
mized, the adaptive filter impulse response is said to 
have converged to the impulse response of the echo 
path. 

Convergence properties and stability aspects of 
the LMS algorithm have been well documented [4,7]. 
The general conditions in practice for the loop-gain 
factor is 


where tr[R xx ] denotes the trace of R xx . The opti- 
mized DSP56001 assembly code for the TDL-LMS 
algorithm in (6) can be written as[8]: 


clr a x0,x:(r0)+y:(r4)+,y0 ;clear a,xO-x(n) 

move x:(r0)+,x1y:(r4)+,y0 ;x1=x(n-1),y0-h(0) 

do #N/2,lms ;doN/2 times 

mac xO,yO,a yO,b b,y:(r5)+ ;a=h(0)*x(n),b-h(0) 
macr xl.yl.b x:(r0)+,x0y:(r4)+,y0 ;b«h(0)+e*x(n-1) 

;x0»x(n-2),y0-h(1) 

mac xl.yO.a yO,b b,y:(r5)+ ;a=a+h(1)x(n-1),b-h(1) 
macr xO.yl.b x:(r0)+,x1 y:(r4)+,y0 ;b-h(0)+e*x(n-1) 

;x0»x(n-3),y0-h(1) 

Ims 

move b,y:(r5)+ ;save new coetfs. 

move (rO)-nO .pointer update 


where rO is the register pointing to the input buffer 
which is modulo- addressed to accommodate 768 ( N ) 
current data points. R4 and r5 registers are pointing to 


the even and odd numbered current adaptive coeffi- 
cients locations, respectively. The Modifier Registers, 
mO, m4 and mS are set to be 767 ( N-l ), 383 (N/2-1) 
and 383 (N/2-1), respectively. This TDL-LMS algo- 
rithm requires only 2N+2 instruction cycles per sam- 
ple period. When 768 taps are used for an acoustic 
echo-canceler the processing requirement at 8 kHz of 
sampling rate is 12.3 million instructions per second 
(MIPS). 

3.2 Adaptive Lattice-LMS Algorithm 

The lattice predictor (often called as one-step pre- 
dictor) structure was originally proposed by Itakura 
and Saito [9] for speech analysis. The one-step pre- 
dictor has also been extended to a noise-canceler con- 
figuration as shown in Figure 4 [5]. If the inputs x(k) 
and p(k) are stationary, then it can be shown that the 

respective steady-state values of e 2 (k) and vj* (k) for 

TDL and lattice models are the same. The filter model 
in Figure 4 consists of 3 stages which can be extended 
to M stages for mathematical analysis purpose. Its up- 
per half (solid lines) is simply the (one-step) predictor 
model [6], The lower portion (dashed lines) consists 

of M additional coefficients, v^, 1 < / < M. The basic 
idea involved in obtaining an adaptive algorithm is to 
continuously adjust the lattice weights \ l (k) and 

v' (jfc) . The v, ( k ) are adjusted to minimize the instan- 
taneous error e\ (k) + wj ( k ) via the one-step predic- 
tor LMS algorithm as 

V ( (*+l) = V ( (t) + (1 -P) [«,(*) ! (*) + ", (*)«,_ l (*) 1(8) 

for which we refer to as the lattice-LMS 

equation for a one-step predictor. In practice, a conve- 
nience choice for a P is p = 1 - (i in (7). 



Figure 4 Lattice structure for noise-canceller 

The lattice filtering computation at each stage 
gives successive orthogonalization process. Thus, the 
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successive coefficients can be optimized independent 
of coefficients at other stages. As a result of this or- 
thogonalization, the convergence rate of the lattice- 
LMS algorithm is not restricted by the eigenvalue 
structure of the input signal, which is not the case for 
the TDL-LMS algorithm. 

Next, another set of coefficients, v / (k) in Figure 4 

provides the noise-cancelling substraction paths. The 
individual coefficients are adjusted to minimize the 

filter error s^(k) using a technique similar to (6). 
Thus we have [10] 

, , [dsj(k) 

Vi (fc + 1) =v (*)-|i — ^ — l < / <; Af (9) 

' 1 L^/WJ ’ 

where p is a convergence parameter. Again from Fig- 
ure 4 it follows that 

s^k) = -v l (k)w i _ l (k+l) (10) 

with s 0 ( k ) = p(k) . Substitution of (10) into (9) leads 
to 

v t (k+ 1) = v ; (A) -2pis l (k)w ! _ l (A + 1) (11) 

for 1 <, Af, The DSP56001 assembly code for the 
lattice-LMS adaptive filter algorithm in (9)-(ll) can 
be written as: 

dofilt 

move y:xdatain,b ; read in x-input 

move y:ydatain,a ; read in y-input 

move b,x:(r0)+ a,y:(r6) ; put input in array 

move b,x:(r2) ; move input to memory 

do #order,endloop 

move b,y:(r7) ; store previous err. in memory 

move y:(r5),y0 ; put v’ in yO for calculation 

move x:(r2),x1 y:(r6),a ; put w n in xl ,s state into a 

macr -yO.xl.a y:(r4),y0 ; a*s n -s n . 1 -v'w n (n+1 ),v in xO 
move x:(r0),a a,y:(r6) ; put w n in a, store s n 

move b,y1 ; move error into y get e n -1 

macr -yO.yl ,a a.xO ; a*w n -v*e n *w n +1 , w n into xO 
macr -x0,y0,b a,x:(rO)+ ; b«e n +1«e n -v*w n ,store w n +1 
move a,x:(r2) ; store wn+1 

move x:(r3),x1 ; move 2*p into xl 

mpy xl.xO.a y:(r6),y1 ; a«2*p*w n , s state into yl 

move a.xl y:(r5),a ; move a into xl , k’ into a 

macr xl.yl.a x:(r7+n7),y0 ; v , n+1 -v , +5*2*mu*w n 
move x:(r0),x1 a,y:(r5)+ ;movew n+1 into xl, store kl 
mpy xl.yO.a b.xl ; a»w n+1 *e n . 1 (n-1), e into xl 
macr x0,x1,a x:(r3),x0 ;a«w rv1 e n . 1 (n-1)+e n w n ,2n in xO 
move a,x1 y:(r4),a ; move a into xl, k into a 

macr x0,x1,a y:(r7)+,y0 ; a=v n(new) =v n(0 | d) +2n(a) 
move a,y:(r4)+ ; store v n(new) 

endloop 

move x:(r0)-,a ; output from filter 

move y:(r6),y0 ; output s n 


move a,y:filtout 
move y0,y:errout 
jmp dofilt 

where rO points to the stored filter coefficients, w n . 
The buffer for rO is 65 ( M+l ) locations and is modulo 
addressed. The extra location is used because new 
values of the filter for the next time period are calcu- 
lated before they are used in the present time period. 
The r4 and r5 registers point to v and v', respective- 
ly. Both are buffers of 64 (M) locations and are mod- 
ulo addressed. The r7 register points to the e n values 
and is 128 ( 2M ) locations to store two time periods of 
error values. The Modified Registers are used; mO is 
set to 64 (M); m4 and m5 are set to 63 (Af-7); and m7 
is set to 127 ( 2M-1 ). This stage requires only 1280 in- 
struction cycles per sample period, which yields a 
processing requirement of 10.25 MIPS at 8 kHz of 
sample rate. 

4.0 Echo Characteristics of Car Interior 

The acoustic path can be considered as a multi-re- 
flection medium with an impulse response duration. 
Thus, the typical acoustics inside a car may have prac- 
tically an infinite number of reflections which have 
different acoustical filtering effects with an exponen- 
tially decaying reverberation effect superimposed. In 
a typical car with reasonable acoustic treatment the 
reverberation time can be 0.1-0.15 seconds to reach 
the reverberation signal level decreased by 10 - 20 dB. 
However, when the car is moving, the background 
noise level due to the noises from engine and road 
may be high enough that the background noise can not 
be distinguished from uncancelled echoes due to an 
insufficient number of taps in the adaptive filters. 

Echo characteristics can be measured by collect- 
ing reverberation and echo responses synchronized by 
an impulse output signal. The impulse can be generat- 
ed in software and converted into an analog signal by 
a D/A converter followed by amplification to yield 
the audible impulse signal. Using a microphone, the 
residual analog signal can be converted to a digital 
signal by an A/D converter such as the DSP56ADC16 
(16-bit Sigma-Delta A/D converter). Thus, an echo 
signal can be characterized by the impulse response of 
an acoustic chamber, or precisely, the paths from the 
loudspeaker to the microphone. 

In this paper a simulated impulse response is used 
in order to characterize the convergence properties of 
both TDL and lattice structure. Figure 5 shows a sim- 
ulated impulse response of a medium size car. 
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Figure 5 Impulse response of a simulated echo path 

6.0 Simulation Results 

A computer simulation was performed on a SUN- 
3/160 workstation, which modeled the system shown 
in Figure 2. It was assumed that the received signal 
was white Gaussian noise. The TDL and lattice algo- 
rithm in (7), (10) and (1 1) were used for the simula- 
tion. The variables in the simulation were p, N, and 
M. 

It has been shown that the convergence parameter 
p, controls the convergence rate and the mean- 
square-error (MSE) of the adaptive system [4], The 
constraints on the choice of p are given by (7). The 
lattice-LMS and TDL-LMS algorithms are compared 
in terms of the MSE criteria which corresponds to the 
uncancelled echo. Figure 6 illustrates the MSEs of the 
algorithms when p = 0.001, N = 768 and M = 64. 
Note that the lattice-LMS algorithm converges much 
faster than the TDL-LMS algorithm. However, since 
the lattice structure used only 64 stages compared 768 
taps for the TDL structure the uncancelled echo (er- 
ror) is much larger than the counterpart. 


*JDL (K=76B) 


attic 


Figure 6 Mean-square errors for the lattice and TDL 

After fast initialization using the lattice-LMS al- 
gorithm as shown in Figure 6, the coefficients are 
converted into TDL-LMS coefficients. Consider the 
equivalent M-TDL taps, defined in (3), from a set of 
M-lattice coefficients in (8) and (11). When y M (k) is 
an estimate of y(ik) as shown in Figure 3, the corre- 
sponding error can be written as 

w- 1 

s M (k) = y(k)-y(k) =>(*)- I 


where h t M denotes the /th equivalent TDL tap when 
M is the total number of taps. Minimizing e l N ( k ) with 
respect to the h i N (new notation of h i for the follow- 
ing derivation purpose), we can derive the following 
recursive algorithm to find a set of equivalent TDL 
taps using matrix bordering technique [11]. 

Kl + x = A ,,L + v' t + I a t+1 -,.z. .i=0,l....L (13) 
where 


a. , ^ - 

l, L + 1 


a i.t + v t, + i a L + i-.,L - i<L+1 (14) 


30(1 a L + l,L + l = - v z. + i * i=L+1 • 

The recursion algorithm in (13) and (14) has to ex- 
tend from L=1 through L-M-l to find a set of M-TDL 
taps. The rest of the N-M coefficients in the TDL 
structure should also be initialized with zero before 
starting the adaptive process with the TDL-LMS al- 
gorithm. 

Figure 7 shows the ERLE plots (defined in (1)). In 
order to smooth the output of adaptive filter the fol- 
lowing smoothing functions were used [12]. 

E [y 2 (k) ] = pE[y 2 (lfc-l)] + (l-p)y 2 (*) (15) 

E [e 2 (k) ] = pE [e 2 (Jfc - 1) ] + ( 1 - p) e 2 (*) (16) 

where (J=0.99 is the smoothing parameter. Note that 
the ERLE increases very rapidly at the initialization 
period. After the adaptive process is converted from 
lattice to TDL structure at t=150 ms, the ERLE in- 
creases slowly to the optimum solution. A total of 

10,000 samples, corresponding to 1.25 seconds, were 
plotted to show the adaptive process when p = 0.001, 
N = 768 and M = 64. 


lattice (M«<J4) 




50 





40 

TDL (N=76B) 



CD 

TJ 

C 

30 

1000 

1250 

Ul 

20 





10 



Figure 7 ERLE of the hybrid structure 

7.0 Hardware Implementation Set-up 

A block diagram of the hardware test implementa- 
tion is shown in Figure 8. The SUN-3/160 worksta- 
tion downloads assembled software into the 
DSP56001 Application Development System (ADS) 
which, in turn, controls the Ariel ADC56000 card. 
The ADS contains a DSP56001 general purpose digi- 
tal signal processor chip which runs software in real 
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time. The Ariel card has dual A/D and D/A convert- 
ers, which convert the signals of the loudspeaker, mi- 
crophone and the receive/transmit lines. This 
implementation allows real-time testing of the adap- 
tive filter concepts discussed previously. 

The DSP56001 is a Harvard Architecture digital 
signal processor which has separate program and data 
memories as well as buses. It currently executes one 
instruction in 75 ns which means a 768-tap TDL-LMS 
adaptive filter can be performed in only 115.2 pj. 
This 13.5 MIPS rating is somewhat deceiving be- 
cause, due to the dual buses and memories, more than 
one operation occurs in each instruction cycle. 



Microphone 


Figure 8 Hardware system set-up for experiment 
8.0 Conclusions 

The feasibility of implementing a full duplex 
hands-free cellular phone using one DSP56001 to 
cancel acoustic echo has been presented. Fast conver- 
gence has been achieved during the initialization 
stage with the lattice-LMS algorithm. After the lattice 
coefficients are converted to the conventional TDL 
structure which has 768 taps, better than 30 dB of 
acoustic ERLE can be theoretically achieved using a 
single DSP56001 by taking advantage of the 24-bit 
coefficient precision. The experimental set-up which 
will be used to verify these predictions was also de- 
scribed. It is hoped that sufficient and fast echo can- 
cellation performance can be achieved by controlling 
the hybrid timing and the convergence parameters 
with this hybrid (lattice-TDL) structure. 
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